Identifying group and individual-level risk factors via risk-driven patient stratification

ABSTRACT

Systems and methods for individual risk factor identification include identifying common risk factors for one or more risk targets from population data. Individuals are stratified into clusters based upon the common risk factors. A discriminability of each of the common risk factors is determined, using a processor, for a target cluster using individual data of the target cluster to provide re-ranked common risk factors as individual risk factors for the target cluster, such that the discriminability is a measure of how a risk factor discriminates its cluster from other clusters.

RELATED APPLICATION INFORMATION

This application is a Continuation application of co-pending U.S. patentapplication Ser. No. 13/632,659 filed Oct. 1, 2012, incorporated hereinby reference in its entirety.

BACKGROUND

1. Technical Field

The present invention relates to risk factor identification, and moreparticularly to identifying group-level and individual-level riskfactors via risk-driven patient stratification.

2. Description of the Related Art

As more clinical information with increasing diversity becomes availablefor analysis, a large number of features can be constructed andleveraged for predictive modeling. The ability to identify risk factorsrelated to an adverse health condition (e.g., congestive heart failure)is very important for improving healthcare quality and reducing cost.The identification of risk factors may allow for the early detection ofthe onset of diseases so that aggressive intervention may be taken toslow or prevent costly and potentially life threatening conditions.

In personalized care management scenarios, it is common for two patientsor groups of patients to have similar risk scores, but based ondifferent risk factors. Conventionally, risk factor identificationutilizes feature ranking methods to rank features that characterize theglobal utility of features. However, methods based on general populationdata will only yield common risk factors and do not address individualdifferences of patients.

SUMMARY

A method for individual risk factor identification includes identifyingcommon risk factors for one or more risk targets from population data.Individuals are stratified into clusters based upon the common riskfactors. A discriminability of each of the common risk factors isdetermined, using a processor, for a target cluster using individualdata of the target cluster to provide re-ranked common risk factors asindividual risk factors for the target cluster, such that thediscriminability is a measure of how a risk factor discriminates itscluster from other clusters.

A method for individual risk factor identification includes identifyingcommon risk factors for one or more risk targets from population data.Individuals are stratified into clusters based upon the common riskfactors. The clusters are identified as one of a plurality of risklevels including at least one high-risk cluster and at least onelow-risk cluster. A discriminability of each of the common risk factorsis determined, using a processor, for a target cluster using individualdata of the target cluster to provide re-ranked common risk factors asindividual risk factors for the target cluster, such that thediscriminability is a measure of how a risk factor discriminates itscluster from other clusters. The other clusters include at least one ofother high-risk clusters, low-risk clusters, and a general population.

A method for individual risk factor identification includes identifyingcommon risk factors for one or more risk targets from population data.Individuals are stratified into clusters based upon the common riskfactors. The clusters are identified as one of a plurality of risklevels including at least one high-risk cluster and at least onelow-risk cluster. A discriminability of each of the common risk factorsis identified, using a processor, for a target cluster using individualdata of the target cluster to provide re-ranked common risk factors,such that the discriminability is a measure of how a risk factordiscriminates its cluster from other clusters. The other clustersinclude at least one of other high-risk clusters, low-risk clusters, anda general population. Each of the re-ranked common risk factors isvalidated using the individual data to provide individual risk factorsfor the target cluster by filtering out the common risk factors that donot indicate actual risk.

A system for individual risk factor identification includes a selectionmodule configured to identify common risk factors for one or more risktargets from population data. A clustering module is configured tostratify individuals into clusters based upon the common risk factors. Aranking module is configured to determine, using a processor, adiscriminability of each of the common risk factors for a target clusterusing individual data of the target cluster to provide re-ranked commonrisk factors as individual risk factors for the target cluster, suchthat the discriminability is a measure of how a risk factordiscriminates its cluster from other clusters.

A system for individual risk factor identification includes a selectionmodule configured to identify common risk factors for one or more risktargets from population data. A clustering module is configured tostratify individuals into clusters based upon the common risk factors. Agroup identification module is configured to identify the clusters asone of a plurality of risk levels including at least one high-riskcluster and at least one low-risk cluster. A ranking module isconfigured to determine, using a processor, a discriminability of eachof the common risk factors for a target cluster using individual data ofthe target cluster to provide re-ranked common risk factors asindividual risk factors for the target cluster, such that thediscriminability is a measure of how a risk factor discriminates itscluster from other clusters. The other clusters include at least one ofother high-risk clusters, low-risk clusters, and a general population.

These and other features and advantages will become apparent from thefollowing detailed description of illustrative embodiments thereof,which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

The disclosure will provide details in the following description ofpreferred embodiments with reference to the following figures wherein:

FIG. 1 is a block/flow diagram showing a high level overview of asystem/method for individual risk factor identification in accordancewith one illustrative embodiment;

FIG. 2 is a block/flow diagram showing a system for individual riskfactor identification in accordance with one illustrative embodiment;and

FIG. 3 is a block/flow diagram showing a method for individual riskfactor identification in accordance with one illustrative embodiment.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

In accordance with the present principles, systems and methods foridentifying individual-level risk factors are provided. Individual dataand population data may be received as input. Individual data mayinclude data for a target group or cluster. In a preferred embodiment,the target cluster is representative of a target individual or patient,e.g., to be treated or examined. Individual data may include, e.g.,electronic health records, questionnaire data, genetic information, etc.

Using population data, common (i.e., global) risk factors for one ormore specific risk targets (e.g., diabetes) are identified. Preferably,each risk factor is identified as being positively or negativelycorrelated with the risk target. Using the identified risk factors,patients from the population data are stratified into clusters.Stratifying patients into clusters may include applying, e.g.,hierarchical clustering, k-means clustering, 2-step clustering, etc.Other methods of clustering are also contemplated. Each cluster isidentified as being high-risk or low-risk. This may be based on theproportion of at-risk patients in each cluster.

The identified risk factors are re-ranked based upon the importance ofeach risk factor to the target cluster. The importance can be quantifiedfor each risk factor as how much the risk factor discriminates itscluster (i.e., the target cluster) from the remaining population. Anumber of comparison configurations may be applied. In one embodiment,the target cluster may be compared against all other high-risk clusters.In another embodiment, the target cluster may be compared against alllow-risk clusters. In still another embodiment, the target cluster maybe compared against the general population. Other comparisonconfigurations are also contemplated. Using one of the comparisonconfigurations, discriminability may be determined In one embodiment,discriminability may be determined by calculating how much each factorcontributes to the training of a classifier. In another embodiment,discriminability may be determined by calculating how much thedistribution of each factor differs in the target cluster as compared tothe pertinent clusters for the selected comparison configuration.

Risk factors that do not indicate actual risk at the local level may befiltered out. The remaining risk factors are outputted as individualrisk factors. The individual risk factors identify the primary riskfactors for a target cluster or target patient. Advantageously, theindividual risk factors may be utilized to, e.g., customize apersonalized care management process or may be displayed for clinicaldecision support at the point-of-care or for patient education.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing. Computer program code for carrying out operations foraspects of the present invention may be written in any combination ofone or more programming languages, including an object orientedprogramming language such as Java, Smalltalk, C++ or the like andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The program codemay execute entirely on the user's computer, partly on the user'scomputer, as a stand-alone software package, partly on the user'scomputer and partly on a remote computer or entirely on the remotecomputer or server. In the latter scenario, the remote computer may beconnected to the user's computer through any type of network, includinga local area network (LAN) or a wide area network (WAN), or theconnection may be made to an external computer (for example, through theInternet using an Internet Service Provider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks. The computer program instructions may also beloaded onto a computer, other programmable data processing apparatus, orother devices to cause a series of operational steps to be performed onthe computer, other programmable apparatus or other devices to produce acomputer implemented process such that the instructions which execute onthe computer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

Referring now to the drawings in which like numerals represent the sameor similar elements and initially to FIG. 1, a block/flow diagramshowing a system/method for a high-level overview of individual riskfactor identification 100 is illustratively depicted in accordance withone embodiment. In block 102, data is received as an input, includingindividual data and population data. Individual data includes data for acluster of one or more individuals (e.g., patients), which may berepresentative of a target individual. Individual data may include,e.g., electronic medical records (e.g., diagnosis, lab results,medication, hospitalization records, etc.), personal lifestylequestionnaire data, personal genetic information, and the like.

In block 104, population data is used to identify common risk factorsfor a specific risk target (e.g., diabetes, congestive heart failure,etc.). The identified risk factors are preferably ranked based on globalutility. For each identified risk factor, it is indicated whether theyare positively or negatively correlated with the specific risk target.

In block 106, individuals (from the population data) are stratified intoclusters based on the identified risk factors. The objective is to groupindividuals into clusters to reflect the differences in theircharacteristics. Patients within the same cluster are exposed to similarrisk factors than those in other clusters.

In block 108, risk groups are identified from the clusters. High and lowrisk groups are identified based upon the proportion of at-risk patientsin each cluster. In block 110, the set of identified risk factorsidentified in block 104 are re-ranked based on how much each risk factordiscriminates its cluster from other clusters.

In block 112, risk factors are validated. Risk factors that do notindicate actual risk at the individual/group level are filtered out. Inblock 114, individual/group risk factors are outputted. Theindividual/group risk factors may be used, e.g., in a personalized caremanagement process or may be displayed in a user interface or dashboard.

Referring now to FIG. 2, a block/flow diagram showing a system 200 forindividual risk factor identification is illustratively depicted inaccordance with one embodiment. The system 200 identifies risk factorsfor a cluster of individuals (e.g., patients). The cluster ofindividuals may preferably represent a target individual, such as, e.g.,a particular patient to be examined. The system may include aworkstation or console 202 from which procedures (e.g., medicalexamination) may be performed. The system 202 preferably includes one ormore processors 210 and memory 216 for storing programs, applicationsand other data. It should be understood that the functions andcomponents of system 200 may be integrated into one or more workstationsor systems.

System 202 may include one or more displays 212 for viewing. The display212 may also permit a user (e.g., physician, care coordinator, caregiver, etc.) to interact with the system 202 and its components andfunctions. This is further facilitated by a user interface 214, whichmay include a keyboard, mouse, joystick, or any other peripheral orcontrol to permit user interaction with the system 202.

System 202 receives input 204, which may include individual data 206 andpopulation data 208. Individual data 206 may include data for a clusterof individuals, which may represent a target individual. For example,individual data 206 may include electronic medical records such asdiagnoses, lab results, medication and hospitalization records,questionnaire data (e.g., personal lifestyle questionnaire), personalgenetic information, etc. for a cluster of individuals. Other data arealso contemplated.

Memory 216 may include selection module 218 configured to identifycommon (i.e., global) risk factors for one or more specific risk targets(e.g., diabetes, congestive heart failure, etc.) from population data208. Risk targets are preferably received as an input 204. Selectionmodule 218 may identify common risk factors using feature selectionbased upon, e.g., filters, wrappers, embedded, and ensemble voting.Filter based feature selection may include, e.g., scoring withstatistical measures (chi-squared (X2), etc.), scoring with informationtheoretic measure (information gain, symmetrical uncertainty, etc.),scoring with feature contribution evaluation (oneR, random forest,relief-f, etc.), and the like. Wrapper based feature selection mayinclude, e.g., wrappers with search methods (breadth first search,exhaustive search, forward search, backward search, hill-climbingsearch, etc.), wrappers with feature subset quality checking measures(correlation-based feature selection, consistency-based featureselection, etc.), and the like. Embedded based feature selection may bebased on, e.g., decision trees, logistic regression, support vectormachine, etc. Ensemble voting based feature selection selects thosefeatures that have more than one selector voted on. Other selectionmethods and configurations are also contemplated.

The selection module 218 ranks identified common risk factors usingobjective functions that characterize the global utility (i.e.,importance) of the risk factors. Preferably, the selection module 218identifies whether a risk factor is positively or negatively correlatedwith a specific risk target. For example, positively correlated riskfactors for congestive heart failure may include age, smoking, bloodpressure, alcohol consumption, etc., while negatively correlated riskfactors for congestive heart failure may include high-densitylipoprotein cholesterol, diet control, etc. The selection module 218selects the top n risk factors, where n is any positive integer.

Memory 216 may also include clustering module 220 configured to stratifyindividuals into clusters using the common risk factors selected byselection module 218. In one embodiment, clustering module 220 applieshierarchical clustering with a known number of clusters k. Eachindividual may be assigned as its own cluster. The distance between eachcluster represents the similarity based on the common risk factors. Thedistance may be represented as a distance metric, such as, e.g.,Euclidean distance, Mahalanobis distance, Manhattan distance, etc. Othermetrics are also contemplated. The closest pair of clusters (i.e., mostsimilar) are merged into a single cluster. Cluster similarity may bebased upon, e.g., single-link clustering, complete-link cluster,average-link clustering, etc. Distances between the newly formed clusterand the remaining clusters are then computed and the process is repeateduntil a specified number of clusters k remains. It is to be understoodthat clustering module 220 may apply other clustering methods, such as,e.g., k-means cluster, 2-step cluster, etc. Clustering module 220 mayapply soft assignment techniques, such that individuals may be assignedto multiple clusters.

Memory 216 also includes group identification module 222 configured toidentify the clusters at a plurality of risk levels. In a preferredembodiment, clusters are identified as high-risk and low-risk clusters.Risk levels of a cluster may be identified based upon the proportion ofat-risk individuals in each cluster. In one embodiment, clusters areidentified as high-risk where the proportion of at-risk individuals isabove a predefined risk threshold (e.g., a cluster with a proportion ofat-risk individuals >0.7 is identified as high-risk). At-riskindividuals may be identified using a classifier previously trained froma similar patient pool to assign risk scores to each individual. Basedon the risk score, each individual of the cluster may be identified asat-risk (e.g., an individual with a risk score >0.5 is identified asat-risk). In another embodiment, the risk status (e.g., at-risk) of eachindividual is known, such as, e.g., in a training phase. Otherconfigurations of group identification module 222 to identify clustersat a plurality of risk levels are also contemplated.

Memory 216 includes ranking module 224 configured to re-rank the commonrisk factors (from selection module 218) based upon how important eachrisk factor is to its particular risk cluster. The importance can bequantified as how much a risk factor discriminates its local clusterfrom other clusters. The discriminability of each risk factor for itscluster may be measured by comparing its cluster (i.e., the targetcluster) with other clusters or individuals based on a number ofdifferent comparison configurations. In one embodiment, the targetcluster is compared with all other high-risk clusters. In anotherembodiment, the target cluster is compared with all low-risk clusters.In yet another embodiment, the target cluster is compared with thegeneral population. Other comparison configurations are alsocontemplated.

Based upon a comparison configuration, risk factor discriminability ismeasured. In one embodiment, risk factor discriminability is measured bycalculating how much each risk factor contributes to the training of aclassifier (e.g., logistic regression model, support vector machine,etc.) for a particular comparison configuration, while accounting forthe bias introduced by the classifier. The classifier may be trained todistinguish the target cluster from the other clusters specified in thecomparison configuration. Preferably, the classifier is trained via,e.g., cross validation. The feature contribution information (i.e.,weightings) exhibited in the training process (e.g., forward selection,backward selection, etc.) are used to rank features for selectionpurposes. The weightings of each risk factor learned in the classifiertraining indicate the importance of each of the risk factors indiscriminating the target cluster from other clusters. Thebest-performing subset of features is selected.

In another embodiment, risk factor discriminability is measured bycalculating how much the distribution of each risk factor differsbetween the target cluster and the clusters or individuals pertaining tothe selected comparison configuration. The distribution may be basedupon a frequency count of each risk factor using statistical methods(e.g., chi-squared (X2), log likelihood ratio (LLR), etc.) andinformation theoretic methods (e.g., point wise mutual information(PMI), information gain (IG), etc.). The distributions of risk factorsmay be compared based on, e.g., the sign test, Wilcoxon signed-ranktest, Spearman's rank-order correlation, Kendall's tau correlation, etc.Other methods are also contemplated.

Preferably, the best combination of the comparison configuration anddiscriminability measure is applied to provide local risk factors (i.e.,re-ranked common risk factors). The best combination may represent thecombination that results in a local risk factor ranking that discernsthe at-risk cluster from other clusters. The best combination may bedetermined by comparing the re-ranked common risk factors for eachcombination with the ranked common risk factors determined by selectionmodule 218. For example, goodness-of-fit measures may be used to comparere-ranked common risk factors with the ranked common risk factor. Othermeasures of comparing risk factor rankings are also contemplated.

Memory 216 may include validation module 226 configured to filter outre-ranked common risk factors that do not indicate actual risk for thetarget cluster to provide individual risk factors for the targetcluster. The validation module 226 may be implemented as a set of rulesto filter out common risk factors that may be discriminative of thetarget cluster, but its inclusion does not indicate risk increase forthe specific risk target. For example, high HDL may actually indicaterisk reduction for patient; however this is not a risk factor. Inanother embodiment, the positive/negative correlation of a particularrisk factor (identified by the selection module 218) is compared againstthe data in the target individual/risk group. Other filtering criteriaare also contemplated.

The risk factor identification system 202 may provide output 228. Output228 may include individual risk factors 230 for a cluster ofindividuals, which may represent a target individual or patient. In oneapplication, individual risk factors 230 may be applied in a personalcare management process for, e.g., clinical decision support at thepoint-of-care. In another application, individual risk factors 230 maybe displayed using display 212 and/or user interface 214 to, e.g.,customize healthcare plans or tailor patient education. Otherapplications are also contemplated.

Referring now to FIG. 3, a flow diagram showing a method for risk factoridentification 300 is illustratively depicted in accordance with oneembodiment. In block 302, individual data and population data areinputted. Individual data may include data for a cluster of individuals(e.g., patients). Preferably, the cluster of individuals isrepresentative of a target individual or patient. Individual data mayinclude, for example, electronic medical records (e.g., diagnosis, labresults, medication, hospitalization records), questionnaire data (e.g.,personal lifestyle questionnaire), genetic information, and the like.

In block 304, common risk factors are identified and ranked for one ormore risk targets from the population data. Risk targets, such as, e.g.,diabetes, congestive heart failure, etc., are preferably received as aninput. Common risk factor identification may be, e.g., filter based,wrapper based, embedded based, ensemble voting based, etc. Risk factorsare ranked using objective functions that characterize global utility ofthe risk factors. In a preferred embodiment, the positive or negativecorrelation of each risk factor is identified for the specific risktarget. The top n risk factors are selected, where n is any positiveinteger. Other methods of common risk factor identification and rankingare also contemplated.

In block 306, individuals are stratified into clusters using theidentified common risk factors. Clustering may include applyinghierarchical clustering, k-means clustering, 2-step clustering, etc.Other clustering methods are also contemplated. The clustering methodsmay be based on distance metrics, such as, e.g., the Euclidean distance,Mahalanobis distance, Manhattan distance, etc. Inherently, patientswithin a cluster are exposed to similar risk factors than those patientsin other clusters.

In block 308, clusters are identified as one of a plurality of risklevels. Preferably, the plurality of risk levels includes a high-riskcluster and a low-risk cluster. Clusters may be identified based uponthe proportion of at-risk patients in each cluster. In one embodiment,high-risk clusters are identified as clusters with a proportion ofat-risk patients greater than a predefined risk threshold. At-riskpatients may be determined using a classifier to assign a risk score toeach individual. Classifiers may be previously trained using a similarpatient pool. Risk scores may be used to determine whether an individualis at-risk (e.g., an individual with a risk score >0.5 is identified asat-risk). In another embodiment, the risk status of individuals mayalready be known (e.g., in a training phase). Other methods of clusterrisk identification are also contemplated.

In block 310, a discriminability of each of the common risk factors maybe determined for a target cluster using the individual data of thetarget cluster to provide re-ranked common risk factors. Thediscriminability may be determined such that the discriminability is ameasure of how a risk factor discriminates its cluster (i.e., the targetcluster) from other clusters. The discriminability may be measured usinga number of different comparison configurations. In block 312, acomparison configuration is selected. In one embodiment, the targetcluster is compared with all other high-risk clusters. In anotherembodiment, the target cluster is compared with all low-risk clusters.In yet another embodiment, the target cluster is compared with thegeneral population. Other comparison configurations are alsocontemplated.

Based upon the comparison configuration, in block 314, thediscriminability of each risk factor may be determined. In oneembodiment, discriminability is measured by calculating how much eachrisk factor contributes to the training of a classifier, such as, e.g.,a logistic regression model, a support vector machine, etc. Theclassifier is preferably trained to distinguish the target cluster fromthe other clusters as specified in the comparison configuration.Training may be performed by, e.g., cross validation. The weightingslearned in the classifier training are used to rank features forselection purposes. The weightings indicate the importance of each riskfactor to its cluster (i.e., its discriminability). The best-performingsubset of features is selected. In another embodiment, discriminabilitymay be measured by calculating how much the distribution of each riskfactor differs in the target cluster and pertinent clusters selected inthe comparison configuration. A number of methods may be applied, suchas statistical methods (e.g., chi-square statistics (X2), log-likelihoodratio (LLR), etc.) and information theoretic methods (e.g., point-wisemutual information (PMI), etc.). Other methods of determiningdiscriminability are also contemplated.

Preferably, the best combination of the comparison configuration (block312) and discriminability measure (block 314) is applied to providere-ranked common risk factors. The best combination may represent thecombination that results in re-ranked common risk factors that discernthe at-risk target cluster from other clusters. The best combination maybe determined by comparing re-ranked common risk factors for eachcombination with the ranked common risk factors determined in block 304.In one embodiment, goodness-of-fit measures may be used to comparere-ranked risk factors with the common risk factors. Other measures ofcomparing risk factor rankings are also contemplated.

In block 316, the re-ranked common risk factors are filtered out that donot indicate actual risk for the target cluster to provide individualrisk factors for the target cluster. Preferably, a set of rules are usedto filter out risk factors may be discriminative of the target cluster,but does not indicate risk increase in the risk target.

In block 318, individual risk factors are outputted for the targetcluster. The target cluster preferably represents a target individual orpatient. The individual risk factors may be, e.g., incorporated in apersonalized care management process or may be displayed for, e.g.,clinical decisions at the point-of-care or patient education.

Having described preferred embodiments of identifying group andindividual level risk factors via risk-driven patient stratification(which are intended to be illustrative and not limiting), it is notedthat modifications and variations can be made by persons skilled in theart in light of the above teachings. It is therefore to be understoodthat changes may be made in the particular embodiments disclosed whichare within the scope of the invention as outlined by the appendedclaims. Having thus described aspects of the invention, with the detailsand particularity required by the patent laws, what is claimed anddesired protected by Letters Patent is set forth in the appended claims.

What is claimed is:
 1. A system for individual risk factoridentification, comprising: a selection module configured to identifycommon risk factors for one or more risk targets from population data; aclustering module configured to stratify individuals into clusters basedupon the common risk factors; and a ranking module configured todetermine, using a processor, a discriminability of each of the commonrisk factors for a target cluster using individual data of the targetcluster to provide re-ranked common risk factors as individual riskfactors for the target cluster, such that the discriminability is ameasure of how a risk factor discriminates its cluster from otherclusters.
 2. The system as recited in claim 1, further comprising agroup identification module configured to identify the clusters as oneof a plurality of risk levels.
 3. The system as recited in claim 2,wherein the plurality of risk levels include at least one low-riskcluster and at least one high-risk cluster.
 4. The system as recited inclaim 2, wherein the group identification module is further configuredto identify the clusters as one of a plurality of risk levels based upona proportion of at-risk individuals in each cluster.
 5. The system asrecited in claim 4, wherein the group identification module isconfigured to identify at-risk individuals based upon a risk score. 6.The system as recited in claim 5, wherein the group identificationmodule is configured to assign the risk score using a classifier.
 7. Thesystem as recited in claim 3, wherein the ranking module is furtherconfigured to compare risk factors for the target cluster with riskfactors for at least one of: other high-risk clusters, low-riskclusters, and a general population.
 8. The system as recited in claim 1,wherein the ranking module is further configured to determine thediscriminability by determining contributions of each risk factor intraining a classifier.
 9. The system as recited in claim 1, wherein theranking module is further configured to determine the discriminabilityby determining a difference in a frequency count-based distributionbetween each risk factor in the target cluster and the other clusters.10. The system as recited in claim 1, further comprising a validationmodule configured to validate each of the individual risk factors usingthe individual data by filtering out the common risk factors that do notindicate actual risk.
 11. The system as recited in claim 1, wherein theindividual data includes one or more of: diagnosis, lab results,medication, hospitalization records, questionnaire data and geneticinformation for the target cluster.
 12. A system for individual riskfactor identification, comprising: a selection module configured toidentify common risk factors for one or more risk targets frompopulation data; a clustering module configured to stratify individualsinto clusters based upon the common risk factors; a group identificationmodule configured to identify the clusters as one of a plurality of risklevels including at least one high-risk cluster and at least onelow-risk cluster; and a ranking module configured to determine, using aprocessor, a discriminability of each of the common risk factors for atarget cluster using individual data of the target cluster to providere-ranked common risk factors as individual risk factors for the targetcluster, such that the discriminability is a measure of how a riskfactor discriminates its cluster from other clusters, the other clustersincluding at least one of other high-risk clusters, low-risk clusters,and a general population.
 13. The system as recited in claim 12, whereinthe group identification module is further configured to identify theclusters as one of a plurality of risk levels based upon a proportion ofat-risk individuals in each cluster.
 14. The system as recited in claim13, wherein the group identification module is further configured toidentify at-risk individuals based upon a risk score.
 15. The system asrecited in claim 14, wherein the group identification module is furtherconfigured to assigns the risk score using a classifier.
 16. The systemas recited in claim 12, wherein the ranking module is further configuredto determine the discriminability by determining contributions of eachrisk factor in training a classifier.
 17. The system as recited in claim12, wherein the ranking module is further configured to determine thediscriminability by determining a difference in a frequency count-baseddistribution between each risk factor in the target cluster and theother clusters.
 18. The system as recited in claim 1, further comprisinga validation module configured to validate each of the individual riskfactors using the individual data by filtering out the common riskfactors that do not indicate actual risk.
 19. The system as recited inclaim 1, wherein the individual data includes one or more of: diagnosis,lab results, medication, hospitalization records, questionnaire data andgenetic information for the target cluster.
 20. A computer readablestorage medium comprising a computer readable program for individualrisk factor identification, wherein the computer readable program whenexecuted on a computer causes the computer to perform the steps of:identifying common risk factors for one or more risk targets frompopulation data; stratifying individuals into clusters based upon thecommon risk factors; and determining, using a processor, adiscriminability of each of the common risk factors for a target clusterusing individual data of the target cluster to provide re-ranked commonrisk factors as individual risk factors for the target cluster, suchthat the discriminability is a measure of how a risk factordiscriminates its cluster from other clusters.